Learning When to Simplify Sentences for Natural Text Simplification
نویسندگان
چکیده
This paper introduces a corpus-based approach for selecting sentences that require simplification in the context of Brazilian Portuguese text simplification system. Based on a parallel corpus of original and simplified text versions, we apply a binary classifier to decide in which circumstances a sentence should or not be split – which is the most important syntactic simplification operation – so that the resulting simplified text is natural and not over simplified. Our classifier reaches 73.5% precision and 73.4% recall when selecting the sentences to be split or kept together.
منابع مشابه
Grammar frequency and simplification: when intuition fails
We investigate whether a medical writer can simplify text by only changing the grammatical structure. Based on a user study, we find that while the sentences look simpler after simplification, they are not easier to understand. For grammatical simplification, better tools are needed to provide more concrete guidance and feedback. Introduction Providing text to patients and health information co...
متن کاملLearning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most previous work simplifies sentences using handcrafted rules aimed at splitting long sentences, or substitutes difficult words using a predefined dictionary. This paper presents a datadriven model based on quasi-synchronous grammar, a formalism that can naturally captur...
متن کاملAligning Texts and Knowledge Bases with Semantic Sentence Simplification
Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written...
متن کاملLearning to Simplify Sentences Using Wikipedia
In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the full range of transformation operations including rewording, reordering, insertion and deletion. We introduce a new translation model for text ...
متن کاملParallel Sentence Compression
Sentence compression is a way to perform text simplification and is usually handled in a monolingual setting. In this paper, we study ways to extend sentence compression in a bilingual context, where the goal is to obtain parallel compressions of parallel sentences. This can be beneficial for a series of multilingual natural language processing (NLP) tasks. We compare two ways to take bilingual...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009